11  R: Objects

11.1 Atomic vectors

An atomic vector in R is a vector containing objects of the same datatype. If the objects are not of the same datatype, then they are coerced to be of the same datatype. It is defined using the keyword c().

numbers = c(1,2,67)

The in-built R function length() is used to find the length of an atomic vector.

length(numbers)
[1] 3

11.1.1 Slicing the atomic vector

An atomic vector can be sliced using the indices of the elements within [] brackets.

For example, consider the vector:

vec<-1:40

Suppose, we wish to get the \(3^{rd}\) element of the vector. We can get it using the index 3:

vec[3]
[1] 3

A sequence of consecutive elements can be sliced using the indices of the first element and the last element around the : operator. For example, let us slice elements from the \(3^{rd}\) index to the \(10^{th}\) element of the vector vec:

vec[3:10]
[1]  3  4  5  6  7  8  9 10

We can slice elements at different indices by putting the indices in an atomic vector within the [] brackets. Let us slice the \(4^{th}\), \(7^{th}\), and \(18^{th}\) elements of the vector vec:

vec[c(4,7,18)]
[1]  4  7 18

We can slice consecutive elements, and non-consecutive elements simultaneously. Let us slice the elements from the \(4^{th}\) index to the \(9^{th}\) index and the \(30^{th}\) and \(36^{th}\) element.

vec[c(4:9,30,36)]
[1]  4  5  6  7  8  9 30 36

11.1.2 Removing elements from atomic vector

Elements can be removed from the vector using the negative sign within [] brackets.

Remove the 2nd element from the vector:

vec<-1:5
vec[-2]
[1] 1 3 4 5

If multiple elements need to be removed, the indices of the elements to be removed can be given as an atomic vector.

Remove elements 2 to 6 and element 10 from the vector:

vec<-1:20
vec[-c(2:6,10)]
 [1]  1  7  8  9 11 12 13 14 15 16 17 18 19 20

Example: USA’s GDP per capita from 1960 to 2021 is given by the vector G in the code chunk below. The values are arranged in ascending order of the year, i.e., the first value is for 1960, the second value is for 1961, and so on. Store the years in which the GDP per capita of the US increased by more than 10%, in a vector.

G = c(3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)
years<-c()
for(i in 1:(length(G)-1))
{
  diff = (G[i+1]-G[i])/G[i]
  if(diff>0.1){years<-c(years,1960+i)}
}
print(years)
[1] 1973 1976 1977 1978 1979 1981 1984

11.1.3 The seq() function

The seq() function is used to generate an atomic vector consisting of a sequence of integers with a constant gap. For example, the code below generates a sequence of integers starting from 20 upto 60 with gaps of 5.

seq(20,60,5)
[1] 20 25 30 35 40 45 50 55 60

11.2 Matrix

Matrices are two-dimensional arrays. The in-built function matrix() is used to define a matrix. An atomic vector can be organized as a matrix by specifying the number of rows and columns.

For example, let us define a 2x3 matrix (2 rows and 3 columns) consisting of consecutive integers fro1 1 to 6.

mat<-matrix(1:6,2,3)
mat
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Note that the integers fill up column-wise in the matrix. If we wish to fill-up the matrix by row, we can use the byrow argument.

mat<-matrix(1:6,2,3, byrow = TRUE)
mat
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

The functions nrow() and ncol() can be used to get the number of rows and columns of the matrix respectively.

nrow(mat)
[1] 2
ncol(mat)
[1] 3

Matrices can be sliced using the indices of row and column separated by a , in box brackets. Suppose we wish to get the element in the \(2^{nd}\) row and \(3^{rd}\) column of the matrix:

mat[2,3]
[1] 6

For selecting all rows or columns of a matrix, the index for the row/column can be left blank. Suppose we wish to get all the elements of the \(1^{st}\) of the matrix:

mat[1,]
[1] 1 2 3

Row and columns of the matrix can be sliced using the : operator. Suppose we want to select a sub-matrix that has elements in the first two rows and columns 2 and 3 of the matrix mat:

mat[1:2,2:3]
     [,1] [,2]
[1,]    2    3
[2,]    5    6

Suppose we need to sum up all the rows of the matrix. We can do it using a for loop as follows:

row_sum<-c(0,0)
for(i in 1:nrow(mat))
{
  for(j in 1:ncol(mat))
  {
    row_sum[i]<-row_sum[i]+mat[i,j]
  }
}
row_sum
[1]  6 15

Observe that in the above for loop, elements of each row are added one at a time. We can add all the elements of a row simultaneously using the sum() function. This will reduce a for loop from the above code:

row_sum<-c(0,0)
for(i in 1:nrow(mat))
{
    row_sum[i]<-sum(mat[i,])
}
row_sum
[1]  6 15

In the above code, we sum up all the elements of the row simultaneously. However, we still need to sum up the elements of each row one at a time.

11.2.1 The apply() function

The apply() function can used to apply a function simultaneously on all rows or columns of a matrix. Thus, this function helps avoid a for loop to iterate over all the rows and columns of the matrix. This reduces the execution time of the code since operations are performed in-parallel, instead of one-at-a-time in a for loop.

Let us use the apply() function to sum up all the rows of the matrix mat.

apply(mat,1,sum)
[1]  6 15

Let us compare the time taken to sum up rows of a matrix using a for loop with the time taken using the apply() function.

options(digits.secs = 6)
start.time <- Sys.time()
row_sum<-c(0,0)
for(i in 1:nrow(mat))
{
    row_sum[i]<-sum(mat[i,])
}
row_sum
[1]  6 15
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 0.006053209 secs
start.time <- Sys.time()
apply(mat,1,sum)
[1]  6 15
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 0.00194788 secs

Observe that the apply() function takes much lesser time to sum up all the rows of the matrix as compared to the for loop.

Recall the earlier example where we computed year’s in which the increase in GDP per capita was more than 10%. Let us use matrices to solve the problem. We’ll also compare the time it takes using a matrix with the time it takes using for loops.

start.time <- Sys.time()

#Let the first column of the matrix be the GDP of all the years except 1960, and the second column be the GDP of all the years except 2021.
GDP_mat<-matrix(c(G[-1],G[-length(G)]),length(G)-1,2)

#The percent increase in GDP can be computed by performing computations using the 2 columns of the matrix
inc<-(GDP_mat[,1]-GDP_mat[,2])/GDP_mat[,2]
years<-1961:2021
years<-years[inc>0.1]
years
[1] 1973 1976 1977 1978 1979 1981 1984
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 0.004628658 secs

Without matrices, the time taken to perform the same computation is measured with the code below.

start.time <- Sys.time()
years<-c()
for(i in 1:(length(G)-1))
{
  diff = (G[i+1]-G[i])/G[i]
  if(diff>0.1){years<-c(years,1960+i)}
}
print(years)
[1] 1973 1976 1977 1978 1979 1981 1984
#print(proc.time()[3]-start_time)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 0.009580612 secs

Observe that matrices reduce the execution time of the code as computations are performed simultaneously, in contrast to a for loop where computations are performed one at a time.

Sometimes, the computations on rows / columns of a matrix are not straighforward and we may need to use the apply() function to apply a function on each row / column of a matrix.

Example: Find the maximum GDP per capita of the US in each of the 5 year periods starting from 1961-1965, and upto 2015-2020.

GDP_5year<-matrix(G[-c(1,length(G))],12,5,byrow = TRUE)
GDP_max_5year<-apply(GDP_5year,1,max)

In the above code, we applied the in-built function max on all the rows. Sometimes, an in-built function may not be available for the computations to be performed. In such as case, we can write our own user-defined function within the apply() function. See the example below.

Example: Find the range (max-min) of GDP per capita of the US in each of the 5 year periods starting from 1961-1965, and upto 2015-2020.

GDP_5year<-matrix(G[-c(1,length(G))],12,5,byrow = TRUE)
GDP_range_5year<-apply(GDP_5year,1,function(x) max(x)-min(x))

In the code above we applied a user-defined function on each row of the matrix. However, if the function has multiple lines, it may be inconvenient to write the function within the apply() function. In that case, we can define the function outside the apply() function.

Example: Find the five year periods starting from 1961-1965, and upto 2016-2020, during which the GDP per capita decreased as compared to the previous year.

GDP_inc<-function(GDP_5yr)
{
  dec<-0
  for(i in 1:4)
  {
    if(GDP_5yr[i+1]<GDP_5yr[i]){dec<-1}
  }
  return(dec)
}

GDP_5year_mat<-matrix(G[-c(1,length(G))],12,5,byrow = TRUE)
years_inc_dec<-apply(GDP_5year_mat,1,GDP_inc)
five_year_periods<-seq(1960,2015,5)
print("Five year periods in which the GDP per capita decreased are those starting from the years:")
[1] "Five year periods in which the GDP per capita decreased are those starting from the years:"
print(five_year_periods[years_inc_dec==1]+1)
[1] 2006 2016

11.2.2 Practice exercise

Find the 5 year period in which the difference of the maximum GDP per capita and the minium GDP per capita as a percentage of the minimum GDP per capita was the highest.